Scenario 1: sampling minibatches from fully observed datasets

To perform online iNMF, we need to install the online branch. Please see the instruction below.

library(devtools)
install_github("MacoskoLab/liger", ref = "online")

We first create a liger object by passing the filenames of HDF5 files containing the raw count data. The data can be downloaded here.

library(liger)

pbmcs = createLiger(list(stim="stim_PBMCs.h5",ctrl="ctrl_PBMCs.h5"))

We then perform the normalization, gene selection, and gene scaling in an online fashion, reading the data from disk in small batches.

pbmcs = normalize(pbmcs)
pbmcs = selectGenes(pbmcs,var.thresh = 0.2,do.plot = F)
pbmcs = scaleNotCenter(pbmcs)

Online Integrative Nonnegative Matrix Factorization

Now we can use online iNMF to factorize the data, again using only minibatches that we read from the HDF5 files on demand.

pbmcs = online_iNMF(pbmcs, k=20, max.epochs = 5)

Quantile Normalization and Downstream Analysis

After performing the factorization, we can perform quantile normalization to align the datasets.

pbmcs = quantile_norm(pbmcs)

We can also visualize the cell factor loadings in two dimensions using t-SNE or UMAP.

pbmcs = runUMAP(pbmcs)
plotByDatasetAndCluster(pbmcs, axis.labels = c("UMAP1","UMAP2"))

Scenario 2: iterative refinement by incorporating new datasets

We can also perform online iNMF with continually arriving datasets.

MOp = createLiger(list(cells="allen_smarter_cells.h5"))
MOp = normalize(MOp)
MOp = selectGenes(MOp,var.thresh = 2)
MOp.vargenes = MOp@var.genes
MOp = scaleNotCenter(MOp)
MOp = online_iNMF(MOp, k=40, max.epochs = 1)
MOp = quantile_norm(MOp)
MOp = runUMAP(MOp)
plotByDatasetAndCluster(MOp, axis.labels = c("UMAP1","UMAP2"))

MOp2 = createLiger(list(nuclei="allen_smarter_nuclei.h5"))
MOp2 = normalize(MOp2)
MOp2@var.genes = MOp@var.genes
MOp2 = scaleNotCenter(MOp2)
MOp = online_iNMF(MOp, X_new = list(nuclei = "allen_smarter_nuclei.h5"), k = 40, max.epochs = 1)
MOp = quantile_norm(MOp)
MOp = runUMAP(MOp)
plotByDatasetAndCluster(MOp, axis.labels = c("UMAP1","UMAP2"))

Scenario 3: projecting new datasets

MOp = createLiger(list(cells="allen_smarter_cells.h5"))
MOp@var.genes = MOp.vargenes
MOp = online_iNMF(MOp, k = 40, max.epochs = 1)
MOp = quantile_norm(MOp)
MOp = runUMAP(MOp)
plotByDatasetAndCluster(MOp, axis.labels = c("UMAP1","UMAP2"))

MOp = online_iNMF(MOp, X_new = list(nuclei = "allen_smarter_nuclei.h5"), k = 40, project = TRUE)
MOp = quantile_norm(MOp)
MOp = runUMAP(MOp)
plotByDatasetAndCluster(MOp, axis.labels = c("UMAP1","UMAP2"))